Search CORE

164 research outputs found

Case-based User Profiling in a Personal Travel Assistant

Author: DW Aha
Foundation for Intelligent Physical Agents
M Lenz
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/1999
Field of study

Crossref

Autonomic self healing and recovery informed by environment knowledge

Author: A Ganek
C Carrick
C Marling
C Marling
D McSherry
David Bustard
David McSherry
DW Aha
DW Aha
DW Aha
I Watson
J Giampapa
R López de Mántaras
R Telford
S Montani
Sa’adah Hassan
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Determining appropriate approaches for using data in feature selection

Author: A Kalousis
C Ambroise
DW Aha
F Wilcoxon
G Chandrashekar
H Liu
J Reunanen
JC Platt
JR Quinlan
L Yu
M Lecocke
MA Hall
P Somol
V Bolón-Canedo
Y Han
Y Saeys
Z He
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 22/12/2015
Field of study

Feature selection is increasingly important in data analysis and machine learning in big data era. However, how to use the data in feature selection, i.e. using either ALL or PART of a dataset, has become a serious and tricky issue. Whilst the conventional practice of using all the data in feature selection may lead to selection bias, using part of the data may, on the other hand, lead to underestimating the relevant features under some conditions. This paper investigates these two strategies systematically in terms of reliability and effectiveness, and then determines their suitability for datasets with different characteristics. The reliability is measured by the Average Tanimoto Index and the Inter-method Average Tanimoto Index, and the effectiveness is measured by the mean generalisation accuracy of classification. The computational experiments are carried out on ten real-world benchmark datasets and fourteen synthetic datasets. The synthetic datasets are generated with a pre-set number of relevant features and varied numbers of irrelevant features and instances, and added with different levels of noise. The results indicate that the PART approach is more effective in reducing the bias when the size of a dataset is small but starts to lose its advantage as the dataset size increases

Crossref

Springer - Publisher Connector

University of East Anglia digital repository

Seir immune strategy for instance weighted naive bayes classification

Author: DW Aha
GI Webb
J Wu
J Wu
J Wu
J Wu
JR Quinlan
L Jiang
L Jiang
N Friedman
SB Kim
T Zhang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

© Springer International Publishing Switzerland 2015. Naive Bayes (NB) has been popularly applied in many classification tasks. However, in real-world applications, the pronounced advantage of NB is often challenged by insufficient training samples. Specifically, the high variance may occur with respect to the limited number of training samples. The estimated class distribution of a NB classier is inaccurate if the number of training instances is small. To handle this issue, in this paper, we proposed a SEIR (Susceptible, Exposed, Infectious and Recovered) immune-strategy-based instance weighting algorithm for naive Bayes classification, namely SWNB. The immune instance weighting allows the SWNB algorithm adjust itself to the data without explicit specification of functional or distributional forms of the underlying model. Experiments and comparisons on 20 benchmark datasets demonstrated that the proposed SWNB algorithm outperformed existing state-of-the-art instance weighted NB algorithm and other related computational intelligence methods

Crossref

OPUS - University of Technology Sydney

A Comparison of Machine Learning Methods for Cross-Domain Few-Shot Learning

Author: DW Aha
E Frank
GJ McLachlan
H Ismail Fawaz
L Bossard
L Breiman
P Helber
P Tschandl
PL Bartlett
R Tibshirani
RJ Durrant
S le Cessie
S Shalev-Shwartz
SJ Pan
SP Mohanty
Y Yang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

We present an empirical evaluation of machine learning algorithms in cross-domain few-shot learning based on a fixed pre-trained feature extractor. Experiments were performed in five target domains (CropDisease, EuroSAT, Food101, ISIC and ChestX) and using two feature extractors: a ResNet10 model trained on a subset of ImageNet known as miniImageNet and a ResNet152 model trained on the ILSVRC 2012 subset of ImageNet. Commonly used machine learning algorithms including logistic regression, support vector machines, random forests, nearest neighbour classification, naïve Bayes, and linear and quadratic discriminant analysis were evaluated on the extracted feature vectors. We also evaluated classification accuracy when subjecting the feature vectors to normalisation using p-norms. Algorithms originally developed for the classification of gene expression data—the nearest shrunken centroid algorithm and LDA ensembles obtained with random projections—were also included in the experiments, in addition to a cosine similarity classifier that has recently proved popular in few-shot learning. The results enable us to identify algorithms, normalisation methods and pre-trained feature extractors that perform well in cross-domain few-shot learning. We show that the cosine similarity classifier and ℓ² -regularised 1-vs-rest logistic regression are generally the best-performing algorithms. We also show that algorithms such as LDA yield consistently higher accuracy when applied to ℓ² -normalised feature vectors. In addition, all classifiers generally perform better when extracting feature vectors using the ResNet152 model instead of the ResNet10 model

Crossref

Research Commons@Waikato

Edinburgh Research Explorer

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Detecting Machine-obfuscated Plagiarism

Author: A Altheneyan
AM Rogerson
C Cortes
D Weber-Wulff
D Weber-Wulff
DW Aha
F Alvi
FM Prentice
JD Velásquez
L Breiman
M Franco-Salvador
M Mohebbi
N Meuschke
P Bojanowski
P McCullagh
Q Madera
S Deerwester
T Eisa
T Foltýnek
T Yokoi
V Kanjirangat
Y Goldberg
Publication venue
Publication date: 13/12/2019
Field of study

Related dataset is at https://doi.org/10.7302/bewj-qx93 and also listed in the dc.relation field of the full item record.Research on academic integrity has identified online paraphrasing tools as a severe threat to the effectiveness of plagiarism detection systems. To enable the automated identification of machine-paraphrased text, we make three contributions. First, we evaluate the effectiveness of six prominent word embedding models in combination with five classifiers for distinguishing human-written from machine-paraphrased text. The best performing classification approach achieves an accuracy of 99.0% for documents and 83.4% for paragraphs. Second, we show that the best approach outperforms human experts and established plagiarism detection systems for these classification tasks. Third, we provide a Web application that uses the best performing classification approach to indicate whether a text underwent machine-paraphrasing. The data and code of our study are openly available.Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/152346/1/Foltynek2020_Paraphrase_Detection.pdfDescription of Foltynek2020_Paraphrase_Detection.pdf : Foltynek2020_Paraphrase_Detectio

Crossref

Deep Blue Documents at the University of Michigan

Physiological wireless sensor network for the detection of human moods to enhance human-robot interaction

Author: A Burns
AM Isen
B Mali
BW White
DW Aha
F-C Kao
Filippo Cavallo
GI Webb
H-M Wang
J-H Kim
Katharina Lochner
M Bradley
M Chen
N Lippman
RW Picard
S Koelstra
S Kreibig
SS Keerthi
T Tamura
UR Acharya
W Boucsein
William W. Cohen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

Crossref

Florence Research

Cross validation of bi-modal health-related stress assessment

Author: A Marty
A Tawari
B Arnrich
B Kedem
B Schuller
B Schölkopf
D Morrison
D Ververidis
DA Craig
DF Tolin
DM Hilty
DR Ladd
DW Aha
EB Baum
Egon L. van den Broek
EL Broek van den
EL Broek van den
EN Khalil
F Pallavicini
Frans van der Sluis
IR Murray
J Blascovich
J Krumm
J Sánchez-Meca
J Wolpe
JA Healey
K Domschke
K Nieuwenhuijsen
KR Scherer
LK Hansen
LM Blainlow
M El Ayadi
M Hall
MD Zwaag van der
MG Newman
N Rüscha
N Rüscha
P Rani
PL Bartlett
R Banse
R Cowie
R Likert
RB Fillingim
RC Kessler
RG Lyons
RW Picard
S Wu
T Shimamura
TM Cover
Ton Dijkstra
TR Kosten
Publication venue: Springer Verlag
Publication date: 01/01/2011
Field of study

This study explores the feasibility of objective and ubiquitous stress assessment. 25 post-traumatic stress disorder patients participated in a controlled storytelling (ST) study and an ecologically valid reliving (RL) study. The two studies were meant to represent an early and a late therapy session, and each consisted of a "happy" and a "stress triggering" part. Two instruments were chosen to assess the stress level of the patients at various point in time during therapy: (i) speech, used as an objective and ubiquitous stress indicator and (ii) the subjective unit of distress (SUD), a clinically validated Likert scale. In total, 13 statistical parameters were derived from each of five speech features: amplitude, zero-crossings, power, high-frequency power, and pitch. To model the emotional state of the patients, 28 parameters were selected from this set by means of a linear regression model and, subsequently, compressed into 11 principal components. The SUD and speech model were cross-validated, using 3 machine learning algorithms. Between 90% (2 SUD levels) and 39% (10 SUD levels) correct classification was achieved. The two sessions could be discriminated in 89% (for ST) and 77% (for RL) of the cases. This report fills a gap between laboratory and clinical studies, and its results emphasize the usefulness of Computer Aided Diagnostics (CAD) for mental health care

Crossref

Springer - Publisher Connector

Copenhagen University Research Information System

Radboud Repository

University of Twente Research Information

Fire detection from social media images by means of instance-based learning

Author: BC Ko
BS Manjunath
D Tjondronegoro
DW Aha
I Guyon
K Dimitropoulos
K Wnukowicz
KL Lee
M Doeller
M Sato
P Zezula
S Kudyba
S Sumathi
T Celik
T Sikora
Y Chunyu
YN Silva
Publication venue: Cham
Publication date: 01/01/2015
Field of study

Social media can provide valuable information to support decision making in crisis management, such as in accidents, explosions, and fires. However, much of the data from social media are images, which are uploaded at a rate that makes it impossible for human beings to analyze them. To cope with that problem, we design and implement a database-driven architecture for fast and accurate fire detection named FFireDt. The design of FFireDt uses the instance-based learning through indexed similarity queries expressed as an extension of the relational Structured Query Language. Our contributions are: (i) the design of the Fast-Fire Detection (FFireDt), which achieves efficiency and efficacy rates that rival to the state-of-the-art techniques; (ii) the sound evaluation of 36 image descriptors, for the task of image classification in social media; (iii) the evaluation of content-based indexing with respect to the construction of instance-based classification systems; and (iv) the curation of a ground-truth annotated dataset of fire images from social media. Using real data from Flickr, the experiments showed that system FFireDt was able to achieve a precision for fire detection comparable to that of human annotators. Our results are promising for the engineering of systems to monitor images uploaded to social media services.FAPESPCNPqCAPESSTIC-AmSudRESCUER project, funded by the European Commission (Grant: 614154) and by the CNPq/MCTI (Grant: 490084/2013-3)International Conference on Enterprise Information Systems - ICEIS (17. 2015 Barcelona

Crossref

Universidade de São Paulo